SDF (chemistry File Format)
   HOME

TheInfoList



OR:

Chemical table file (CT File) is a family of text-based
chemical file format A chemical file format is a type of data file which is used specifically to depicting molecular data. One of the most widely used is the chemical table file format, which is similar to ''Structure Data Format'' (SDF) files. They are text files ...
s that describe molecules and chemical reactions. One format, for example, lists each atom in a molecule, the x-y-z coordinates of that atom, and the bonds among the atoms.


File formats

There are several file formats in the family. The formats were created by MDL Information Systems (MDL), which was acquired by Symyx Technologies then merged with
Accelrys BIOVIA is a software company headquartered in the United States, with representation in Europe and Asia. It provides software for chemical, materials and bioscience research for the pharmaceutical, biotechnology, consumer packaged goods, aerospa ...
Corp., and now called BIOVIA, a subsidiary of Dassault Systemes of
Dassault Group Dassault Group (; also GIM Dassault or Groupe Industriel Marcel Dassault SAS) is a French corporate group, group of companies established in 1929 with the creation of Société des Avions Marcel Bloch (now Dassault Aviation) by Marcel Dassault, a ...
. CT File is an
open format An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. Open file format is licensed with open li ...
, BIOVIA publishes its specification. BIOVIA requires users to register to download the CTFile format specifications.


Molfile

An MDL Molfile is a file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule. The molfile consists of some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information. The molfile is sufficiently common that most, if not all, cheminformatics software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica. The current ''de facto'' standard version is molfile V2000, although, more recently, the V3000 format has been circulating widely enough to present a potential compatibility issue for those applications that are not yet V3000-capable.


Counts line block specification


Bond block specification

The
Bond Bond or bonds may refer to: Common meanings * Bond (finance), a type of debt security * Bail bond, a commercial third-party guarantor of surety bonds in the United States * Chemical bond, the attraction of atoms, ions or molecules to form chemica ...
Block is made up of bond lines, one line per bond, with the following format: 111 222 ttt sss xxx rrr ccc where the values are described in the following table:


Extended Connection Table (V3000)

The extended (V3000) molfile consists of a regular molfile “no structure” followed by a single molfile appendix that contains the body of the connection table (Ctab). The following figure shows both an alanine structure and the extended molfile corresponding to it. Note that the “no structure” is flagged with the “V3000” instead of the “V2000” version stamp. There are two other changes to the header in addition to the version: * The number of appendix lines is always written as 999, regardless of how many there actually are. (All current readers will disregard the count and stop at M END.) * The “dimensional code” is maintained more explicitly. Thus “3D” really means 3D, although “2D” will be interpreted as 3D if any non-zero Z-coordinates are found. Unlike the V2000 molfile, the V3000 extended Rgroup molfile has the same header format as a non-Rgroup molfile.


Counts line

A counts line is required, and must be first. It specifies the number of atoms, bonds, 3D objects, and Sgroups. It also specifies whether or not the CHIRAL flag is set. Optionally, the counts line can specify molregno. This is only used when the regno exceeds 999999 (the limit of the format in the molfile header line). The format of the counts line is:


SDF

SDF is one of a family of chemical-data file formats developed by MDL; it is intended especially for structural information. "SDF" stands for structure-data file, and SDF files actually wrap the molfile (
MDL Molfile Chemical table file (CT File) is a family of text-based chemical file formats that describe molecules and chemical reactions. One format, for example, lists each atom in a molecule, the x-y-z coordinates of that atom, and the bonds among the atoms. ...
) format. Multiple records are delimited by lines consisting of four dollar signs ($$$$). A feature of the SDF format is its ability to include associated data. Associated data items are denoted as follows: > XCA3464366 > 5.825 > Sigma > 499.611 Multiple-line data items are also supported. The MDL SDF-format specification requires that a hard-carriage-return character be inserted if a single line of any text field exceeds 200 characters. This requirement is frequently violated in practice, as many
SMILES The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors f ...
and
InChI The International Chemical Identifier (InChI or ) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the we ...
strings exceed that length.


Other formats of the family

There are other, less commonly used formats of the family: * RXNFile - for representing a single chemical reaction; * RDFile - for representing a list of records with associated data. Each record can contain chemical structures, reactions, textual and tabular data; * RGFile - for representing the
Markush structures A Markush structure is a representation of chemical structure used to indicate a group of related chemical compounds. They are commonly used in chemistry texts and in patent claims. Markush structures are depicted with multiple independently var ...
(deprecated, Molfile V3000 can represent Markush structures); * XDFile - for representing chemical information in
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
format.


See also

* Chemical file format#Converting Between Formats


References


External links


SDF Prò
paid software to process SD files (SDF) fro
Adroit DI

SDF Toolkit
free software to process SD files (SDF).
NCI/CADD Chemical Identifier Resolver
generates SD files (SDF) from chemical names, CAS Registry Numbers, SMILES, InChI, InChIKey, ....
KNIME
free software to manipulate data and do datamining, can also read and write SD files (SDF).
Comparative Toxicology Dashboard
service provided by the Environmental Protection Agency (EPA) which generates SD files (SDF) from chemical names, CAS Registry Numbers, SMILES, InChI, InChIKey, ... {{DEFAULTSORT:Chemical Table File Computational chemistry Chemical file formats